Robust PCA methods for complete and missing data

نویسنده

  • Juha Karhunen
چکیده

In this paper, we consider and introduce methods for robust principal component analysis (PCA), including also cases where there are missing values in the data. PCA is a widely applied standard statistical method for data preprocessing, compression, and analysis. It is based on the second-order statistics of the data and is optimal for Gaussian data, but is often applied to data sets having unknown or other types of probability distributions. PCA can be derived from minimization of the mean-square representation error or maximization of variances under orthonormality constraints. However, these quadratic criteria are sensitive to outliers in the data and long-tailed distributions, which may degrade the results given by PCA badly. We introduce robust methods for estimation of both the PCA eigenvectors directly or the PCA subspace spanned by them. Experimental results show that our methods provide often better results than standard PCA when outliers are present in the data. Furthermore, we extend our methods to incomplete data with missing values. The problems arising in such cases have several features typical for nonlinear models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Principal Component Analysis of Process Datasets with Missing Values

Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA), whi...

متن کامل

A Robust PCA Algorithm for Building Representations from Panoramic Images

Appearance-based modeling of objects and scenes using PCA has been successfully applied in many recognition tasks. Robust methods which have made the recognition stage less susceptible to outliers, occlusions, and varying illumination have further enlarged the domain of applicability. However, much less research has been done in achieving robustness in the learning stage. In this paper, we prop...

متن کامل

تحلیل درستنمایی ماکزیمم مدل رگرسیون لجستیک در حالتی که داده های متغیرهای پیشگو کامل نیستند ولی متغیرهای کمکی وجود دارند

Background and Objectives: Missing data exist in many studies, e.g. in regression models, and they decrease the model's efficacy. Many methods have been suggested for handling incomplete data: they have generally focused on missing outcome values. But covariate values can also be missing.Materials and Methods: In this paper we study the missing imputation by the EM algorithm and auxiliary varia...

متن کامل

Bayesian Robust PCA for Incomplete Data

We present a probabilistic model for robust principal component analysis (PCA) in which the observation noise is modelled by Student-t distributions that are independent for different data dimensions. A heavy-tailed noise distribution is used to reduce the negative effect of outliers. Intractability of posterior evaluation is solved using variational Bayesian approximation methods. We show expe...

متن کامل

مقایسه روش بیزی (Bayesian) و کلاسیک در برآرد پارامترهای مدل رگرسیون لجستیک با وجود مقادیر گمشده در متغیرهای کمکی

Background and Aim: Logistic regression is an analytic tool widely used in medical and epidemiologic research. In many studies, we face data sets in which some of the data are not recorded. A simple way to deal with such "missing data" is to simply ignore the subjects with missing observations, and perform the analysis on cases for which complete data are available. Materials and Methods: We c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011